Class 3 – 11/5/2002

Special Variables

These are for your future reference! They are often very useful, but use with care.

The default input and pattern-searching space. The following pairs are equivalent:

while (<>) { ... }
while ($_ = <>) { ... }

/^Subject:/
$_ =~ /^Subject:/

y/a-z/A-Z/
$_ =~ y/a-z/A-Z/

chop
chop($_)

(Mnemonic: underline is understood in certain operations.)

The current input line number of the last filehandle that was read. Readonly. Remember that only an explicit close on the filehandle resets the line number. (Mnemonic: many programs use . to mean the current line number.)

The input record separator, newline by default. Works like awk's RS variable, including treating blank lines as delimiters if set to the null string. You may set it to a multicharacter string to match a multi-character delimiter. Note that setting it to "\n\n" means something slightly different than setting it to "", if the file contains consecutive blank lines. Setting it to "" will treat two or more consecutive blank lines as a single blank line. Setting it to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / is used to delimit line boundaries when quoting poetry.)

The output field separator for the print operator. Ordinarily the print operator simply prints out the comma separated fields you specify. In order to get behavior more like awk, set this variable as you would set awk's OFS variable to specify what is printed between fields. (Mnemonic: what is printed when there is a , in your print statement.)

$""

This is like $, except that it applies to array values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)

The output record separator for the print operator. Ordinarily the print operator simply prints out the comma separated fields you specify, with no trailing newline or record separator assumed. In order to get behavior more like awk, set this variable as you would set awk's ORS variable to specify what is printed at the end of the print. (Mnemonic: you set $\ instead of adding \n at the end of the print. Also, it's just like /, but it's what you get "back" from perl.)

The output format for printed numbers. This variable is a half-hearted attempt to emulate awk's OFMT variable. There are times, however, when awk and perl have differing notions of what is in fact numeric. Also, the initial value is %.20g rather than %.6g, so you need to set $# explicitly to get awk's value. (Mnemonic: # is the number sign.)

The current page number of the currently selected output channel. (Mnemonic: % is page number in nroff.)

The current page length (printable lines) of the currently selected output channel. Default is 60. (Mnemonic: = has horizontal lines.)

The number of lines left on the page of the currently selected output channel. (Mnemonic: lines_on_page - lines_printed.)

The name of the current report format for the currently selected output channel. Default is name of the filehandle. (Mnemonic: brother to $^.)

The name of the current top-of-page format for the currently selected output channel. Default is name of the filehandle with "_TOP" appended. (Mnemonic: points to top of page.)

If set to nonzero, forces a flush after every write or print on the currently selected output channel. Default is 0. Note that STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe, such as when you are running a perl script under rsh and want to see the output as it's happening. (Mnemonic: when you want your pipes to be piping hot.)

The process number of the perl running this script. (Mnemonic: same as shells.)

The status returned by the last pipe close, backtick (\`\`) command or system operator. Note that this is the status word returned by the wait() system call, so the exit value of the subprocess is actually ($? >> 8). $? & 255 gives which signal, if any, the process died from, and whether there was a core dump. (Mnemonic: similar to sh and ksh.)

The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: like & in some editors.)

$\`

The string preceding whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: \` often precedes a quoted string.)

The string following whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: ' often follows a quoted string.) Example:

$_ = 'abcdefghi';

/def/;

print "$\`:$&:$'\n"; # prints abc:def:ghi

The last bracket matched by the last search pattern. This is useful if you don't know which of a set of alternative patterns matched. For example:

/Version: (.*)|Revision: (.*)/ && ($rev = $+);

(Mnemonic: be positive and forward looking.)

Set to 1 to do multiline matching within a string, 0 to tell perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. (Mnemonic: * matches multiple things.) Note that this variable only influences the interpretation of ^ and $. A literal newline can be searched for even when $* == 0.

Contains the name of the file containing the perl script being executed. Assigning to $0 modifies the argument area that the ps(1) program sees. (Mnemonic: same as sh and ksh.)

$<digit>

Contains the subpattern from the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digit.)

The index of the first element in an array, and of the first character in a substring. Default is 0, but you could set it to 1 to make perl behave more like awk (or Fortran) when subscripting and when evaluating the index() and substr() functions. (Mnemonic: [ begins subscripts.)

The string printed out when you say "perl -v". It can be used to determine at the beginning of a script whether the perl interpreter executing the script is in the right range of versions. If used in a numeric context, returns the version + patchlevel / 1000. Example:

## see if getc is available
($version,$patchlevel) = $] =~ /(\d+\.\d+).*\nPatch level: (\d+)/;
print STDERR "(No filename completion available.)\n" if $version * 1000 + $patchlevel < 2016;

## or, used numerically, warn "No checksumming!\n" if $] < 3.019;

(Mnemonic: Is this version of perl in the right bracket?)

The subscript separator for multi-dimensional array emulation. If you refer to an associative array element as

$foo{$a,$b,$c}

it really means

$foo{join($;, $a, $b, $c)}

But don't put

@foo{$a,$b,$c} # a slice--note the @

which means

($foo{$a},$foo{$b},$foo{$c})

Default is "\034", the same as SUBSEP in awk. Note that if your keys contain binary data there might not be any safe value for $;. (Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon. Yeah, I know, it's pretty lame, but $, is already taken for something more important.)

If used in a numeric context, yields the current value of errno, with all the usual caveats. (This means that you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used in a string context, yields the corresponding system error string. You can assign to $! in order to set errno if, for instance, you want $! to return the string for error n, or you want to set the exit value for the die operator. (Mnemonic: What just went bang?)

The perl syntax error message from the last eval command. If null, the last eval parsed and executed correctly (although the operations you invoked may have failed in the normal fashion). (Mnemonic: Where was the syntax error "at"?)

The real uid of this process. (Mnemonic: it's the uid you came FROM, if you're running setuid.)

The effective uid of this process. Example:

$< = $>; # set real uid to the effective uid

($<,$>) = ($>,$<); # swap real and effective uid

(Mnemonic: it's the uid you went TO, if you're running setuid.) Note: $< and $> can only be swapped on machines supporting setreuid().

The real gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getgid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. (Mnemonic: parentheses are used to GROUP things. The real gid is the group you LEFT, if you're running setgid.)

The effective gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getegid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. (Mnemonic: parentheses are used to GROUP things. The effective gid is the group that's RIGHT for you, if you're running setgid.)

Note: $<, $>, $( and $) can only be set on machines that support the corresponding set[re][ug]id() routine. $( and $) can only be swapped on machines supporting setregid().

The current set of characters after which a string may be broken to fill continuation fields (starting with ^) in a format. Default is "\ \n-", to break on whitespace or hyphens. (Mnemonic: a "colon" in poetry is a part of a line.)

$^D

The current value of the debugging flags. (Mnemonic: value of -D switch.)

$^F

The maximum system file descriptor, ordinarily 2. System file descriptors are passed to subprocesses, while higher file descriptors are not. During an open, system file descriptors are preserved even if the open fails. Ordinary file descriptors are closed before the open is attempted.

$^I

The current value of the inplace-edit extension. Use undef to disable inplace editing. (Mnemonic: value of -i switch.)

$^L

What formats output to perform a formfeed. Default is \f.

$^P

The internal flag that the debugger clears so that it doesn't debug itself. You could conceivable disable debugging yourself by clearing it.

$^T

The time at which the script began running, in seconds since the epoch. The values returned by the -M , -A and -C filetests are based on this value.

$^W

The current value of the warning switch. (Mnemonic: related to the -w switch.)

$^X

The name that Perl itself was executed as, from argv[0].

$ARGV

contains the name of the current file when reading from <>.

@ARGV

The array ARGV contains the command line arguments intended for the script. Note that $#ARGV is the generally number of arguments minus one, since $ARGV[0] is the first argument, NOT the command name. See $0 for the command name.

@INC

The array INC contains the list of places to look for perl scripts to be evaluated by the "do EXPR" command or the "require" command. It initially consists of the arguments to any -I command line switches, followed by the default perl library, probably "/usr/local/lib/perl", followed by ".", to represent the current directory.

%INC

The associative array INC contains entries for each filename that has been included via "do" or "require". The key is the filename you specified, and the value is the location of the file actually found. The "require" command uses this array to determine whether a given file has already been included.

$ENV{expr}

The associative array ENV contains your current environment. Setting a value in ENV changes the environment for child processes.

$SIG{expr}

The associative array SIG is used to set signal handlers for various signals. Example:

sub handler { # 1st argument is signal name
    local($sig) = @_;
    print "Caught a SIG$sig--shutting down\n";
    close(LOG);
    exit(0);
}

$SIG{'INT'} = 'handler'; ## Call the handler function upon SIGINT
$SIG{'QUIT'} = 'handler'; ## Call the handler function upon SIGQUIT
...
$SIG{'INT'} = 'DEFAULT'; # restore default action
$SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT

The SIG array only contains values for the signals actually set within the perl script.

Misc file test operators (these don't belong here, but I needed to get them in)

-x

A file test. This unary operator takes one argument, either a filename or a filehandle, and tests the associated file to see if something is true about it. If the argument is omitted, tests $_, except for -t, which tests STDIN. It returns 1 for true and '' for false, or the undefined value if the file doesn't exist. Precedence is higher than logical and relational operators, but lower than arithmetic operators. The operator may be any of:

-r File is readable by effective uid/gid.

-w File is writable by effective uid/gid.

-x File is executable by effective uid/gid.

-o File is owned by effective uid.

-R File is readable by real uid/gid.

-W File is writable by real uid/gid.

-X File is executable by real uid/gid.

-O File is owned by real uid.

-e File exists.

-z File has zero size.

-s File has non-zero size (returns size).

-f File is a plain file.

-d File is a directory.

-l File is a symbolic link.

-p File is a named pipe (FIFO).

-S File is a socket.

-b File is a block special file.

-c File is a character special file.

-u File has setuid bit set.

-g File has setgid bit set.

-k File has sticky bit set.

-t Filehandle is opened to a tty.

-T File is a text file.

-B File is a binary file (opposite of -T).

-M Age of file in days when script started.

-A Same for access time.

-C Same for inode change time.

File and Network I/O

Introduction

Since Perl was originally written as a report generator, it's not surprising that it can perform various I/O operations. In fact, since I/O is such an important part of Perl, it has more magic than most other parts. The special filehandles STDIN, STDOUT and STDERR come pre-opened, so you don't need to open them to use them.

Opening a File

You can open a file for input or output using the open() function.

## Open a file for reading
open(INFILE, "input.txt") or quit("Can't open input.txt: $!", 1);

## Open a file for writing – in overwrite mode
open(OUTFILE, "> output.txt") or quit("Can't open output.txt: $!", 1);

## Open a file for writing – in append mode
open(LOGFILE, ">> my.log") or quit("Can't open logfile: $!", 1);

The first example above, INFILE is the filehandle. By convention, filehandles are in all-caps, to set them apart. The second argument specifies the filename. By default, the file is opened for reading.

You can substitute any of the “real” filenames with a variable:

%conf = ( “logFile”=> “/var/log/my.log” );
open(LOGFILE, ">>
$conf{'logFile'}") or quit("Can't open logfile: $!", 1);

binmode(FILEHANDLE)

Arranges for the file to be read in "binary" mode in operating systems that distinguish between binary and text files. Files that are not read in binary mode have CR LF sequences translated to LF on input and LF translated to CR LF on output. Binmode has no effect under Unix.

Reading from a Filehandle

You can read from an open filehandle using the <> (diamond) operator. In scalar context it reads a single line from the filehandle, and in list context it reads the whole file in, assigning each line to an element of the list:

my $line = <INFILE>;
## $line is now the first (or next) line from INFILE
my @lines = <INFILE>;
## @lines is now a list of every line from INFILE
$lines[616] = “hi there”;

Reading in the whole file at one time is called slurping. It can be useful but it can be a memory hog. Most text file processing can be done a line at a time with Perl's looping constructs.

The <> operator is most often seen in a while loop:

while (<INFILE>) {
## assigns each line (one at a time) to $_
print "Just read in this line: $_";
}

..or when you need to read one line from STDIN:

print “Do you want to continue? [y/n] “;
my $answer = <STDIN>;
if ($answer =~ /y/i) {
## Do something here
}

## Side note from class: getc() can be used to read a single
## character from a filehandle.
my $character = getc(INFILE);

## And the chomp function will remove any line ending characters (\n and \r)
## from any string.
$line = “blah\n”;
$line = chomp($line); ## $line now is “blah”

End of Files (EOF)

The eof function tests end-of-file status. Normally, it is invoked as eof(FILEHANDLE), which returns true if FILEHANDLE is currently at the end of file (i.e., if the next read would return the undefined value).

If you omit the FILEHANDLE argument, eof tests the last filehandle that was read from.

Writing to a Filehandle

We've already seen how to print to standard output using print(). However, print() can also take an optional first argument specifying which filehandle to print to:

print STDERR "This is your final warning.\n"; ## Prints to STDERR
print OUTFILE “$record\n”; ## Prints to the OUTFILE filehandle
print LOGFILE $logmessage; ## Prints to the LOGFILE filehandle

Closing a Filehandle

When you're done with your filehandles, you should close() them (Perl will clean up after you if you forget, but it's good practice to close your own filehandles):

close(LOGFILE;

Outgoing Network Sockets

One method of connecting to a remote server is:

## Connect the the remote server 192.168.1.16 on TCP port 80
use IO::Socket;
my $server = “192.168.1.16”;
my $port = 80;
my $socket = IO::Socket::INET->new( PeerAddr => $server,
    PeerPort => $port,
    Proto => 'tcp',
    Autoflush => 1,
    Blocking => 1,
) or quit("$$ - $conf{'programName'} - ERROR - Connect to $server:$port failed. Error was: $!",1);

At this point $socket is a network socket, exactly like a filehandle socket. Proto can be either “tcp” or “udp”. AutoFlush specifies that Perl should not buffer anything. By default Perl will buffer connection streams (for network and file I/O) which can make it seem like things aren't working when they really are. The Blocking specifies whether or not to enable blocking reads from the filehandle. The default is to enable blocking reads. There are more options

Writing to a network socket

This is exactly like writing to any other filehandle:

## Say hi to the remote server
print $socket “Hi there.. want to talk?\r\n”;

Reading from a network socket

This is exactly like reading from any other filehandle:

## See what the server said
my $response = <$socket>;

You can disconnect from a remote server by simply closing the filehandle.

Incoming Network Sockets (writing server software)

One method of accepting an incoming network connection is like this:

## Open a network port and listen for incoming connections
use IO::Socket;
my $socket = IO::Socket::INET->new( LocalPort => $port,
    Proto => 'tcp',
    Listen => 10,
    Reuse => 1,
    Autoflush => 1,
    Blocking => 1,
) or quit("$$ - $conf{'programName'} - OS-ERROR - Failed to bind to tcp port $port. Error was: $!",1);

## Forever (while 1)
## 1. Accept an incoming connection
## 2. Read a message from the client
## 3. Send the client a nice message
## 4. Disconnect the client

while (1) {
    my $clientSocket = $socket->accept() or quit("$$ - $conf{'programName'} - OS-ERROR - \$socket->accept failed: $!", 1);
    my $message = <$clientSocket>;
    print $clientSocket “You said: $message\r\n”;
    close $clientSocket;
}

I/O Special Variables

As you might expect, there are a number of special variables associated with I/O. They are:

while (<FILEHANDLE>) reads the next line into $_ by default.

The input record separator. This is a newline ("\n") by default.

Note that this is magical: if you set it to the empty string (""), it will behave as if you had set it to two newlines ("\n\n"), with this exception: two or more blank lines in a row will be compressed into one blank line. This makes it easy to read files one paragraph at a time.

If set to a nonzero value, forces a flush every time you write to the currently-selected filehandle.

Output field separator. When you print several items, separated by commas, Perl inserts the value of $, between each item.

Like $,, but applies to arrays interpolated into a double-quoted string.

The current input line number for the last filehandle that was read from.

The number of lines per page on the currently-selected output channel.

The number of lines left on the current output page.

The current page number of the currently-selected output channel.

The name of the current format for the currently-selected output channel.

The name of the current top-of-page format for the currently-selected output channel.

A string containing the characters after which it is okay to break a long line in a format, and start filling in continuation (^) fields. This is "\n-" by default.

$^L

The string that formats should output to produce a form feed. This is "\f" by default.

$^A

The current value of the write accumulator for format lines. See perlform(1) and perlfunc(1) for details.

$^I

The current value of the inplace-edit extension. If Perl is running with the -i command-line option, but no backup extension specified, $^I will be the empty string. If the -i option was not specified, $^I has the undefined value.

Subroutines (Functions)

Introduction

http://www.perldoc.com/perl5.8.0/pod/perlsub.html

A subroutine may be declared as follows:

sub NAME { ... }

and called as:

NAME(arg1, arg2, ...); ## Arguments are optional!
## or
&NAME(arg1, arg2, ...);

Any arguments passed to the routine come in as array @_. So to get “arg1” from the above example you would do something like this:

sub NAME {
    my $arg1 = $_[0];
    ## Or you could do this instead (better)
    (my $arg1, my $arg2) = @_;
}

The return value of the subroutine is the value of the last expression evaluated, and can be either an array value or a scalar value. Alternately (preferably), a return statement may be used to specify the returned value and exit the subroutine.

sub NAME {
    my ($arg1, my $arg2) = @_;
    my $answer = $arg1 * $arg2;
    return($answer);
}

You can define functions wherever you like: the Perl compiler will find them during the compilation phase, and make them available to your code by the time the body of the program is executed. You don't have to worry about defining functions before calling them.

Example functions

## Function add, adds two numbers and returns the new number
sub add {
    (my $number1, my $number2) = @_;
    my $result = $number1 + $number2;
    return($result);
}

Here is a real subroutine taken from one of my programs:

###############################################################################################
## FUNCTION:
## openLogFile ( $filename )
##
##
## DESCRIPTION:
## Opens the file $filename and attaches it to the filehandle "LOGFILE". Returns 0
## on success and non-zero on failure. Any generated error message will get set in
## global variable $!.
##
##
## Example:
## openFile ("/var/log/scanAlert.log");
##
###############################################################################################
sub openLogFile {

    ## Get the incoming filename
    my $filename = $_[0];

    ## Make sure our file exists, and if the file doesn't exist then create it
    if ( ! -f $filename ) {
        printmsg("NOTICE: The file [$filename] does not exist. Creating it now with mode [0600].", 0);
        open (LOGFILE, ">>$filename");
        close LOGFILE;
        chmod (0600, $filename);
    }

    ## Now open the file and attach it to a filehandle
    open (LOGFILE,">>$filename") or return (1);

    ## Put the file into non-buffering mode
    select LOGFILE;
    $| = 1;
    select STDOUT;

    ## Tell the rest of the program that we can log now
    $conf{'logging'} = "yes";

    ## Return success
    return(0);
}

Notice the comments at the top of the function - this is extremely important!

Built-in Functions

http://www.perldoc.com/perl5.6/pod/perlfunc.html

http://caspian.dotconf.net/menu/Links/Useful_Things/Tutorials/perl-all.html#pl-exp-arith.html

Homework

Create a script that can read from and write to a file. The actual opening and closing of files should be done in one or more separate subroutines.